Automatic Cluster Stopping with Criterion Functions and the Gap Statistic

نویسندگان

  • Ted Pedersen
  • Anagha Kulkarni
چکیده

SenseClusters is a freely available system that clusters similar contexts. It can be applied to a wide range of problems, although here we focus on word sense and name discrimination. It supports several different measures for automatically determining the number of clusters in which a collection of contexts should be grouped. These can be used to discover the number of senses in which a word is used in a large corpus of text, or the number of entities that share the same name. There are three measures based on clustering criterion functions, and another on the Gap Statistic.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing different stopping criteria for fuzzy decision tree induction through IDFID3

Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...

متن کامل

An Introduction to a New Criterion Proposed for Stopping GA Optimization Process of a Laminated Composite Plate

Several traditional stopping criteria in Genetic Algorithms (GAs) are applied to the optimization process of a typical laminated composite plate. The results show that neither of the criteria of the type of statistical parameters, nor those of the kinds of theoretical models performs satisfactorily in determining the interruption point for the GA process. Here, considering the configuration of ...

متن کامل

Cluster Stopping Rules For Word Sense Discrimination

As text data becomes plentiful, unsupervised methods for Word Sense Disambiguation (WSD) become more viable. A problem encountered in applying WSD methods is finding the exact number of senses an ambiguity has in a training corpus collected in an automated manner. That number is not known a priori; rather it needs to be determined based on the data itself. We address that problem using cluster ...

متن کامل

Unsupervised Domain Adaptation for I-vector Speaker Recognition

In this paper, we present a framework for unsupervised domain adaptation of PLDA based i-vector speaker recognition systems. Given an existing out-of-domain PLDA system, we use it to cluster unlabeled in-domain data, and then use this data to adapt the parameters of the PLDA system. We explore two versions of agglomerative hierarchical clustering that use the PLDA system. We also study two auto...

متن کامل

Automatic concept identification in goal-oriented conversations

We address the problem of identifying key domain concepts automatically from an unannotated corpus of goal-oriented human-human conversations. We examine two clustering algorithms, one based on mutual information and another one based on Kullback-Liebler distance. In order to compare the results from both techniques quantitatively, we evaluate the outcome clusters against reference concept labe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006